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ABSTRACT 

The  Hierarchical  Timestamping  Algorithm  is  proposed  for  handling  database  concurrency 
control.  By  analyzing  transaction  conflicts  and  partitioning  the  database  into  hierarchical  parti- 
tions to  which  transactions  will  access  discriminantly  using  different  synchronization  protocols,  the 
algorithm  can  offer  significant  performance  gain.  It  also  reduces  the  need  for  transactions  to  leave 
traces  (e.g..  locks,  timestamps)  when  accessing  a  data  element. 

It  is  shown  that  the  algorithm  is  correct  in  terms  of  transaction  serializability.  This  is  done 
by  showing  that  the  algorithm  enforces  a  topological  order  among  transactions.  This  research 
advocates  the  potential  benefit  of  application  analysis  in  enhancing  performance  of  concurrency 
control  algorithms  where  the  level  of  concurrency  is  vital  to  system  performance. 


1.    Introduction 

The  two  basic  approaches  to  database  concurrency  control  are  the  two-phase  locking  tech- 
nique [Eswaran76j  and  the  timestamp  ordering  technique  (BernsteinSO,  Reed78].  Both  techniques 
endorse  serializability  as  their  criterion  of  correctness.  While  having  the  advantages  of  being 
deadlock  free  and  in  no  need  of  unlock  instructions  when  a  transaction  is  finished,  the  timestamp- 
ing  algorithm  has  the  drawback  of  rigidly  obeying  the  order  of  timestamps  which  have  been 
assigned  to  transactions  without  much  consideration  of  the  static  or  the  dynamic  nature  of  actual 
interferences  among  transactions.  In  this  paper,  the  Hierarchical  Timestamping  (HTS)  approach 
is  described  which  makes  unique  use  of  the  nature  of  the  application  conflicts.  By  decomposing 
the  database  into  partitions  to  which  the  transactions  will  access  discriminantly,  the  approach  can 
offer  significant  performance  gain  in  handling  database  concurrency  control.  Equally  important  is 
its  prospect  in  reducing  the  need  for  a  transaction  to  leave  a  "trace"  (e.g.,  a  lock  or  a  timestamp) 
when  accessing  data  elements. 

Conflict  analysb  among  transactions  has  been  proposed  in  the  research  of  SDD-1  [Bern- 
steinSO,  BernsteinSlj  as  a  vehicle  to  discover,  a  priori,  certain  (static)  conflict  patterns  among 
transaction  classes  that  may  enable  a  more  flexible  timestamp  protocol  (e.g..  Protocol  1  in  SDD- 
I's  terminology)  to  be  used.  However,  with  an  orientation  towards  a  distributed  database  system 
where  timestamps  are  not  stapled  with  data  elements,  the  SDD-1  approach  stops  short  of  develop- 
ing a  more  generalized  timestamping  algorithm  that  takes  better  advantage  of  information  that 
may  be  provided  via  conflict  analysis,  .\long  a  difl^erent  dimension,  a  novel  method  has  been 
presented  in  [Viemont82'  which  blends  timestamp  ordering  and  two-phase  locking  in  one  and 
chooses  to  switch  to  one  or  the  other  at  the  most  opportune  time  so  as  to  increase  level  of  con- 
currency. The  multi-version  timestamping  algorithm  has  also  been  developed  and  shown  to  pro- 
vide a  higher  level  of  concurrency  than  the  conventional  single-version  timestamping  algorithm 
|Reed78,  Reed79.  Bernstein83  ,  Theoretical  aspects  of  multi-version  databases  are  discussed  in 
[Bernstein83.    Papadimitriou84J,    whilf    one-pre\ious-version    concurrency     control     methods    in 
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[BayerSO,  Garcia-Molina82]  and  multiple-previous-version  methods  in  [StearnsSl,  Chan82, 
Chan85].  Simulation  studies  of  multi-version  methods  have  been  reported  in  [Harder86,  Carey86] 
and  the  results  are  generally  favorable. 

Using  conflict  analysis  and  multi-version  database  as  vehicles,  the  algorithm  proposed  in  this 
paper,  called  the  Hierarchical  Timestamping  Algorithm,  or  the  HTS  algorithm,  allows  a  transac- 
tion to  use  an  "array"  of  timestamps,  rather  than  a  single  timestamp,  to  synchronize  its  accesses. 
Which  timestamp  a  transaction  will  use  to  synchronize  an  access  to  a  particular  data  element 
depends  on  which  "data  partition"  the  data  element  belongs  to.  The  method  is  based  on  the 
hierarchical  database  decomposition  proposed  in  [Hsu83j.  In  comparison,  the  structural  locking 
protocol  [SilberschatzSO,  Kedem83]  is  a  non-two-phase  locking  protocol  which  aims  at  increasing 
level  of  concurrency  by  reducing  the  amount  of  lime  the  locks  on  the  high-level  nodes  of  a  tree 
must  be  held  by  each  transaction.  It  requires  that  the  transactions  access  the  nodes  of  the  tree  in 
certain  sequence  and  it  involves  lock  and  unlock  protocols  for  each  access  to  the  high-level  nodes 
of  the  tree.  The  hierarchy  referred  to  in  their  study  is  entirely  different  from  the  kind  of  hierar- 
chy discussed  in  this  paper. 

Before  going  into  the  detailed  mechanism  of  the  algorithm,  the  basic  idea  behind  the  algo- 
rithm can  be  illustrated  by  referring  to  an  example  schedule  of  read  and  write  steps  from  three 
transactions  t^.t^,  and  tr,'. 

io{\V,b),  t^(R.a),  t^R.b).  ^(U•,6),  t^(\V.c) 
where  t^(R,a)  refers  to  a  request  from  transaction  <,  to  read  data  element  a,  and  t^(W,b)  to 
write  data  element  6,  and  so  on.  Suppose  the  timestamp  of  t^  is  smaller  than  that  of  tr,.  Using 
the  basic  timestamping  algorithm,  t^  will  be  denied  permission  to  write  data  element  b  (since  at 
that  time  b  is  stamped  with  a  tr.  read  timestamp  which  is  greater  than  that  of  t^'s),  and  forced  to 
abort.  Under  our  proposed  method,  however,  if  the  data  partition  containing  data  element  a  is 
related  to  that  containing  data  element  6  in  a  certain  way  (the  exact  nature  to  be  explained 
later),  then   the  algorithm  allows  transaction   tr.  to  access  b    with  a    pseudo'  read  timestamp  TS 


earlier  than  t^'s  timeslamp  so  that  /,'s  write  request  can  proceed  without  being  delayed  or 
aborted.  Furthermore,  this  pseudo  read  timestamp  TS  may  be  engineered  to  be  early  enough  so 
that  no  transactions  will  any  longer  write  b  with  even  earUer  timestamps,  eUminating  the  need  to 
leave  TS  with  6  as  a  read  timestamp.  Therefore  this  read  access  can  be  accomplished  without 
having  to  leave  any  "trace"  for  concurrency  control  purposes.  With  these  properties,  the  proposed 
algorithm  has  the  potential  of  increasing  the  level  of  concurrency  while  reducing  the  costly  over- 
head of  leaving  access  traces. 

The  organization  of  this  paper  is  as  follows.  In  the  following  section,  the  Hierarchical  Times- 
tamping  Algorithm  is  described  in  detail.  The  proof  of  correctness  is  given  in  section  three.  In 
section  four,  we  discuss  the  optimality  aspect  of  the  algorithm.    Section  five  concludes  the  paper. 

2.    The  Hierarchical  Timestamping  Algorithm 

The  HTS  Algorithm  requires  the  decomposition  of  a  database  into  a  number  of  data  parti- 
tions. We  construct  a  data  partition  hierarchy  which  is  basically  a  partial  order  of  the  data  parti- 
tions subject  to  certain  constraints.  The  Hierarchical  Timestamping  Algorithm  is  a  timestamp- 
based  concurrency  control  algorithm  parameterized  by  a  chosen  data  partition  hierarchy  and  its 
corresponding  transaction  classification.  When  the  database  decomposition  consists  of  a  single 
data  partition,  the  HTS  Algorithm  degenerates  to  the  conventional  multi-version  timestamping 
algorithm. 

We  will  in  this  section  first  discuss  the  method  of  transaction  analysis  and  the  mechanbm 
for  constructing  a  database  partition  hierarchy.  Then  in  Section  2.2  we  present  the  algorithm 
itself. 

2.1.    Transaction  Analysis 

Let  a  database  be  decomposed  into  a  number  of  data  partitions.  The  purpose  of  transaction 
analysis  in  our  algorithm  is  to  facilitate  correct  construction  of  a  data  partition  hierarchy  on 
which  the  hierarchical  timestamp  protocol  is  based.    The  data  partition   hierarchy,  constructed 


off-line,  is  basically  a  partial  order  of  the  data  partitions  subject  to  additional  graph-theoretic  con- 
straints and  constraints  related  to  transaction  analysis.  Note  that  given  any  data  base  partition 
and  any  set  of  transactions,  a  correct  data  partition  hierarchy  can  be  found  and  the  hierarchical 
timestamp  protocol  applicable.  However,  performance  and  optimality  of  the  protocol  (e.g., 
whether  it  can  outperform  the  basic  multi-version  timestamp  algorithm)  depends  on  an  intelligent 
choice  of  database  deomposition  and  data  partition  hierarchy.  In  this  section,  the  mechanics  of 
constructing  a  correct  data  partition  hierarchy  is  discussed. 

2.1.1.    Transaction  analysis  against  a  database  partition 

Analysis  of  a  set  of  update  transactions  T,  against  a  database  consisting  of  a  set  of  data 
partitions  results  in  a  data  partition  graph,  where  the  nodes  are  the  data  partitions,  and  the  arcs 
are  assigned  in  such  a  way  that  there  is  an  arc  from  a  data  partition  Z),  to  another  data  partition 
D-  if  and  only  if  one  can  find  a  potential  transaction  in  the  database  system  that  updates  data 
elements  in  Z),  and  accesses  (i.e.,  reads  or  writes)  data  elements  in  D  .  That  is,  D^—*D-,  i^j, 
indicates  that  there  exist  transactions  in  the  system  that  would  potentially  link  updates  of  data 
elements  in  D,  to  the  content  of  data  elements  in  P  . 

Definition.    Let  7",  be  a  set  of  update  transactions  to  be  performed  on  a  database  D.    Let  P 

be  a  decomposition  of  D  into  data  partitions  Z)[,Z)o D^.    A  data  partition  graph  of  P  w.r.t.  T, 

is  a  digraph  denoted  as  DPG(P .T^]  with  nodes  corresponding  to  the  data  partitions  of  P  and  a 
set  of  directed  arcs  jommg  these  nodes  such  that,  for  iV;,  D^—'-D  iff  there  exists  a  transaction 
(f  T",  s.t.  w(t)r^D^^4>  and  a(  Of^Z)  ^^i?,  where  w(t),  r{t)  and  a(t)  are  the  write  set,  the  read  set 
and  the  access  set  of  transaction  /     (The  access  set  a{t)  is  the  union  of  r(t)  and  w{t).) 

The  data  partition  graph  is  a  tool  for  capturing  the  pattern  of  conflict  among  transactions 
to  be  run  in  the  system.  Note  that,  in  our  algorithm,  there  is  no  need  for  read-only  transactions 
to  participate  in  the  transaction  analysts,  eliminating  the  difficulties  of  pinning  down,  a  priori,  the 
nature  of  all  ad  hoc  queries. 


2.1.2.  Constructing  a  data  partition  hierarchy 

Once  a  data  partition  graph  is  defined  for  a  database,  a  data  partition  hierarchy,  which 
assigns  a  partial  order  to  the  set  of  data  partitions,  can  be  derived  as  follows. 

Definition.  Given  a  data  decomposition  P  and  a  data  partition  graph  DPG(P ,T^),  a  data 
partition  hierarchy,  denoted  as  DPH[P ,T^),  b  any  acyclic  graph  where  nodes  are  data  partitions 
in  P ,  and  arcs  between  partitions,  denoted  as  — ►,  such  that 

(1)  DPH[P ,T^)  is  a  semi-tree  (a  semi-tree  is  an  acyclic  digraph  where  there  exists  at  most  one 
path  between  any  pair  of  nodes,)  and 

(2)  If  there   exists   a   directed   path    between    D-   and   D^    in   DPG{P,T^),   then   there   exists  a 
directed  path  between  D,  and  D^in  DPH{P.T,). 

Note  that  a  data  partition  hierarchy  is  necessarily  a  transitive  reduction:  there  are  no  transi- 
tive arcs  in  DPH{P .Tu).  There  may  be  multiple  data  partition  hierarchies  that  satisfy  the  above 
definition  given  a  database  decomposition  P  and  a  data  partition  graph  DPG(P ,T^).  However, 
given  any  P  and  T, ,  the  existence  of  a  DPH  is  guaranteed.  This  directly  follows  from  the  fact 
that  the  transitive  reduction  of  any  acyclic  digraph  G  of  the  data  partitions  in  P  that  is  a  total 
order  of  the  data  partitions  satisfies  the  definition  of  DPH(P .T,)- 

In  the  remainder  of  the  paper,  the  notation  DPH  refers  to  a  particular  data  partition  hierar- 
chy chosen  to  base  our  hierarchical  timesiamp  protocol.  We  say  that  a  data  partition  D-  is  higher 
than  a  data  partition  D^,  denoted  as  D->D',  if  there  exists  a  directed  path  D^—*  ■  ■  ■  —*D-  in 
DPH.  In  general,  we  say  that  data  partition  D,  and  data  partition  D^  are  related  if  either  D-=D^ 
or  Z),  and  D    are  connected  by  a  directed  path 

2.1.3.  Transaction  Classification 

Given  a  data  partition  hierarchy  DPH ,  it  is  possible  to  classify  transactions  in  T,  based  on 
the  data  partition(s)  they  write  into  Based  on  the  definition  of  data  partition  hierarchy,  if  there 
exists  a  transaction  t(T    which  writes  into  more  than  one  data  partition,  then  all  the  data  parti- 


tions  that  it  writes  into  must  be  connected  by  a  directed  path  in  the  data  partition  hierarchy. 
Therefore,  the  rule  for  transaction  classification  is  as  follows: 

Assign  each  update  transaction  t  to  the  highest  data  partition  D-  it  writes  into. 
We  denote  I  being  assigned  to  Z),  as  /fD,,  and  D-  is  called  the  home  data  partition  of  t. 

The  following  property  establishes  that  the  data  partition  that  contains  a  data  element  that 
a  transaction  accesses  must  be  related  to  the  home  data  partition  of  that  transaction  in  the  data 
partition  hierarchy.  This  property  completes  the  background  for  establishing  the  hierarchical 
timestamp  protocol  to  be  described  in  the  next  section. 

Property.  Let  tiD,.  Then  the  data  partition  that  contains  a  data  element  that  t  accesses 
(read  or  write)  is  related  to  the  D,  in  DPH. 

2.2.    The  Hierarchical  Timestamp  Protocol 

The  hierarchical  timestamp  protocol  (HTS)  is  a  concurrency  control  algorithm  parameterized 
by  a  chosen  data  partition  hierarchy  DPH  and  the  corresponding  transaction  classification.  Like 
the  conventional  multi-version  timestamp  algorithm  (MVTS),  the  HTS  algorithm  assigns  a  times- 
tamp to  a  transaction  t  when  it  initiates.  We  will  call  this  timestamp  the  initiation  timestamp  of 
t,  denoted  as  I[t].  However,  the  timestamp  the  transaction  actually  uses  to  synchronize  its 
accesses  may  be  different  from  I(t).  In  particular,  the  timestamp  to  access  data  elements  in  data 
partitions  higher  than  Ts  home  partition  will  normally  be  somewhat  smaller  than  I(t),  while  the 
reverse  is  true  for  accesses  to  lower  partitions.  The  exact  mechanism  used  to  calculate  these 
"access  timestamps"  will  be  such  that  the  overall  serializability  is  still  preserved. 

2.2.1.    Basic  Definitions 

Two  functions,  called  the  Decrement  function,  or  the  function  DEC.  and  the  Increment 
function,  or  the  function  I.\'C.  are  devised  to  compute  the  timestamps  a  transaction  uses  to  access 
data  partitions  different  from  its  own  home  partition.  Function  DEC  is  used  for  computing  times- 
tamps   for    accessing    higher    data    partitions,    and    !.\'C    for   lower   partitions.     These    definitions 


assume  that  a  data  partition  hierarchy  DPH  is  given. 

Two  values,  /(/)  and  C(t),  are  associated  with  each  transsaction  t  in  the  system.  We 
assume  that  the  timestamp  assigned  to  a  transaction  b  its  initiation  time,  denoted  as  I(t). 
Therefore  every  data  access  request  issued  by  a  transaction  must  occur  after  I{t).  Each  transac- 
tion is  also  assigned  a  commit  timestamp  C(t)>I{t).  In  typical  cases,  C(t)  is  the  time  when  t 
actually  finishes;  i.e.,  no  data  access  request  would  normally  be  issued  at  time  after  C{t)  by  tran- 
saction t.  However  there  are  exceptional  occasions  (as  explained  in  procedure  Clater)  when  C(t) 
is  forced  to  be  a  smaller  value  than  the  time  when  t  actually  finishes.  It  is  sufficient  for  our  pur- 
pose to  have  I(t)<C(t). 

Definition.    A  transaction  t  is  active  at  time  m  if  I{t)<.m  and  C{t]'>m  . 

Definition.  The  function  /, ,  defined  for  a  data  partition  D-,  maps  a  time  m  to  another  time 
m' ,  i.e.,  m'=/,(m),  such  that  m'  is  the  initiation  time  of  the  oldest  active  transaction  assigned  to 
D,  at  time  m.   Formally, 

'm  if  there  exists  no  tiD^  active  at  time  m 
.Min{I(t))  otheru'ise ,  for  all  ttO^  active  at  time  m. 


/.(m)=' 


Definition     Let  the  Decrement  function  DEC^  ^  be  a  function  defined  for  a  pair  of  data  par- 
titions Z),  and  Z)  ,  where  D  '>D,.    DEC^     recursively  maps  a  lime  to  another  time  as  follows. 


D£C._,(m)  = 


w  if  D=D- 


Ij(m)  \l  D,^Dj  IS  m  DPH 


DEC^-[DECij^{m))  otherwise,  where  D--*D^-*....-*D^  is  in  DPH. 


That  is,  the  function  DEC^  ^  maps  a  time  m  in  Z),  to  the  initiation  time,  DEC-j(m),  of  suc- 
cessively (i  e.,  along  the  critical  path  of  DPH)  the  oldest  active  transaction  assigned  to  D^.  For 
example,  if  the  critical  path  between  D,  and  D^  is  D,—-Di^  —  Dj,    then    DEC,  j{m)=I^{I^{m)). 

We  will  now  describe  the  functions  C,  and  ZA'C'^-,,  that  can  be  considered  conceptually  the 
inverse  of  functions  7,  and  DEC^^. 
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Definition.  Let  C,:m  —  m'  be  a  function  which  maps  a  time  m  to  another  m'  where  D-  is  a 
data  partition  and  C,(m)  is  determined  as  follows. 

Im  if  there  exists  no  t(D^  active  at  time  m, 
Max(C{t))  otherwise,  for  all  teD-  active  at  time  m. 

That  is,  C{m)  is  the  latest  commit  time  of  all  transactions  assigned  to  Z),  that  started 
before  or  at  time  m. 

While  the  DEC  function  maps  a  time  in  a  lower  partition  to  the  initiation  time  of  some 
transaction  assigned  to  a  higher  partition,  the  INC  function  maps  a  time  in  a  higher  partition  to 
the  commit  time  of  some  transaction  assigned  to  a  lower  partition: 

Definition.  The  Increment  function,  defined  for  a  pair  of  data  partitions  P,  and  £)y,  where 
D  yD  ,  denoted  as  INC^(m).  is  a  function  which  maps  a  time  value  to  another  such  that 

tm  if  D.=D, 


INC-,{m)  = 


C-{m)  if  D,— D^  ts  in  DPH 


INCi^-{INCji^(m))  otherwise,  where  £)•-►....  — r'^^—Z)^  is  in  DPH 


2.2.2.    Hierarchical  Timestamp  Protocol 

Now  we  introduce  the  hierarchical  timestamp  protocol,  where  synchronization  timestamps 
are  calculated  according  to  functions  DEC  and  INC  above. 
Hierarchical  Timestamp  Protocol  For  Update  Transactions: 

For  every  database  access  request  from  an  update  transaction  tcD^  for  a  data  granule  dtD-, 
the  following  protocol  is  observed; 

Protocol  E  (for  accessing  a  partition  eqxial  to  home  partition) 

If  D  =D^,  then  (equivalent  to  the  basic  multi-version  timestamping) 

(1)      If  it  is  a  read  request,  then  grant  the  latest  version  before  I{t)  of  d,  and  leave  I{t)  as  the 
read  timestamp  of  this  version  of  d  if  its  current  read  timestamp  is  smaller  than  I(t). 
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(2)  If  it  is  a  write  request,  then  if  the  read  timestamp  of  the  latest  version  before  I(t)  of  d  is 
smaller  than  /(/),  then  create  a  new  version  of  d  with  version  number  /(<)  Otherwise  abort 
t. 

Protocol  H  (For  accessing  Higher  partitions) 

If  Dj>D-  then  grant  t  access  to  the  latest  version  before  DEC^   (lit))  of  d. 

Protocol  L  (For  accessing  Lower  partitions) 
If  D,>D-.  then 

(1)  If  it  is  a  read  request,  then  grant  the  latest  version  before  INC^  j{I(t))  of  d,  and  leave 
INC^j(I(t))  as  the  read  timestamp  of  this  version  of  d  if  its  current  read  timestamp  is 
smaller  than  IXC,j(I(t)). 

(2)  If  it  is  a  write  request,  then  if  the  read  timestamp  of  the  latest  version  before  INC-  {!{t))  of 
d  is  smaller  than  /A'C,  (7(0).  then  create  a  new  version  of  d  with  version  number 
INC-JI(t)).    Otherwise  abort  t. 

2.2.3.    Implementing  HTS  Protocols 

To  implement  the  HTS  protocol,  we  must  specify  how  C(t)  is  assigned  and  how^  A('")  ^^'^ 
C-{m)  are  computed  operationally  We  will  first  introduce  two  procedures:  (1)  ProcC^,  used  for 
computing  C,(rn),  and  (2)  ProcI^,  used  for  computing  I,{m).  The  procedure  ProcC-(m)  adds  a 
constant  time  quantum  9,  >0,  generic  to  a  data  partition  Z),,  to  the  argument  value.  In  addition, 
it  remembers  that  the  procedure  has  been  invoked  for  time  value  m  so  as  to  constrain  future 
assignment  of  C(t)  for  teD,.  The  procedure  Procl^(m)  computes  the  initiation  timestamp  of  the 
oldest  active  transaction  in  Z),  at  time  m  With  PtocC  and  Prod  defined,  the  timestamps  used  in 
the  HTS  protocols  are  computed  as  follows  Let  the  path  in  DPH  between  two  data  partitions  £), 
and  D  be  D,— ►Dj— ►  ■  ■  ■  —D,-  The  timestamp  DEC^  ^(m)  used  in  Protocol  H  is  computed  as 
Proclj[... [Proclaim]...),  and  that  usi-d  in  Protocol  L  is  computed  as  ProcCi^{...(ProcC-(Tn)...). 
These  procedures  are  consistent  with  thf  definitions  of  the  functions  DEC  and  INC  given  in  the 
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previous  subsection. 

These  procedures  are  described  as  follows: 

ProcC^(m):  Procedure. 

If  m+q-  is  less  than  current  time,  then  return  ("Abort  Requestor"); 
Mark  m;    /*  remembers  that  ProcC-  has  been  invoked  with  argument  m  */ 
Create  a  pseudo  transaction  <*(£>,  s.t.  I[t! )=m  and  C{l!)=mJrq-. 
^      Return  (m-\-q-). 

The  commit  timestamp  C(t)  is  computed  for  liD^  where  I  finishes  at  time  m  as  follows: 
If  there  exists  an  invocation  of  ProcC\{m')  s  t.    I{t)<m'  <m  —  q^   then   C(t)  is  asssigned  the 
minimum  of  m'+q^  over  all  such  m' .    Otherwise  C(t)=m. 

In  other  words,  if  ProcC\  has  not  been  invoked  in  £>,  during  the  life  time  of  t  then  the  com- 
mit timestamp  of  tiD^  is  simply  Ts  finish  time.  When  such  invocations  exist,  and  at  least  one  of 
them  was  invoked  with  an  argument  value  which  is  at  least  q-  time  units  before  <'s  finishing  time 
(which  also  implies  that  t  is  at  least  7,  long),  then  t's  commit  timestamp  is  pushed  backwards.  It 
is  noted  that  If  the  time  quantum  g, 's  are  selected  to  be  large  enough  such  that  most  of  the  tran- 
sactions in  Z),  that  require  accesses  to  lower  data  partitions  would  do  so  within  q^  units  of  time 
after  they  start,  the  chance  of  having  to  abort  such  transactions  due  to  aborted  ProcC-  pro- 
cedures would  be  relatively  small,  and  virtually  all  C{t)  would  be  the  same  as  the  actual  finish 
time  of  t. 

The  procedure  Proel^(m)  is  defined  as  follows: 
Proclaim): 

If  there  exists  unfinished  transaction  t  at  time  rn  s.t.    its  finish  time  is  unknown  (i.e.,  t  is  not 

finished  at  the  time  Proclaim)  is  invoked,)  and  it  is  known  that  C{t)  is  samller  than  m.  then 

suspend  requestor  till  t  finishes; 

Compute  the  initiation  time  of  the  oldest  active  transaction  at  time  m  as  m' ; 

Return  {m' ); 
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It  is  noted  that  Prod-  would  suspend  requestor  only  if  there  exists  a  previous  invocation  of 
Ci{m')  in  Z),  for  some  m'  <m  —  q^  s.t.  transactions  started  before  m'  have  not  finished  by  the  time 
PTOcI-{m)  is  invoked.  Note  that  Procl^(m)  is  always  invoked  at  a  time  later  than  m.  Therefore, 
for  appropriately  large  9,'s  the  chance  of  an  invocation  of  Froc/,(m)  needing  to  block  is  very 
small. 

To  ensure  that  the  procedure  /,(m)  is  implementable,  we  argue  that  when  /.(m)  is  invoked 
to  compute  an  access  timestamp,  all  information  needed  to  correctly  compute  /,(m)  is  available. 
Since  /.("i)  is  always  invoked  at  a  time  later  than  m ,  it  is  sufficient  to  argue  that  at  time  m,  all 
information  needed  to  correctly  compute  I,(m)  is  available: 

At  time  m,  all  regular  transactions  in  Z),  with  I{t)<m  would  have  been  known.  Therefore 
we  only  need  to  consider  pseudo  transactions.  Pseudo  transactions  f  with  I{f)<m  would  be 
inserted  at  time  mf  >m  only  if  /('')>"'/  — 9,  >m— 9,.  If  such  f  is  inserted  as  a  result  of  comput- 
ing INC-  ■{x)  then  there  must  exist  a  transaction  t"  in  Z),  s.t.  I(f')=I(t')  and  at  time  mf  it  is  still 
unfinished  as  it  is  making  a  request  to  lower  data  segment  at  time  mf .  Therefore  the  effect  of 
the  pseudo  f  on  I,{m)  would  be  dominated  by  that  of  t"  which  b  known  by  time  m.  If  such  f  is 
inserted  as  a  result  of  computing  ISC^.  y(x)  where  D,  is  on  the  path  between  D^  and  D^  then  due 
to  the  behavior  of  ProcC\  the  argument  that  ProcC-  receives  must  be  greater  than  the  time  when 
ProcC^  is  invoked.  Therefore  a  pseudo  transaction  t'  with  I(t')<m  must  be  known  by  time  m. 
Therefore  all  regular  and  pseudo  transactions  that  can  influence  the  value  of  I,{m)  are  known  by 
m.  For  all  transactions  known  to  be  unfinished  at  time  m  whose  commit  timestamps  would  be 
smaller  than  m  are  also  known  by  time  m  due  to  the  rule  for  assigning  C{t)  to  values  other  than 
Ts  actual  finish  time.  Therefore  all  information  on  transactions  needed  to  compute  I,(m)  is 
known  by  time  m . 

2.2.4.    Remarks 

To  summarize,  several  observations  are  made  of  this  set  of  protocols: 
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(1)  If  the  database  partition  hierarchy  DPH  consists  of  a  single  data  partition,  then  only  Proto- 
col E  will  apply,  and  the  hierarchical  timestamp  algorithm  is  reduced  to  the  conventional 
MVTS  algorithm. 

(2)  Since  no  transaction  will  write  a  data  partition  higher  than  its  own  home  partition,  Protocol 
H  needs  to  cover  only  read  accesses.  More  importantly,  Protocol  H  is  "cheaper"  than  either 
Protocol  E  or  Protocol  L,  since  it  does  not  require  timestamping  the  data  element  accessed. 
The  key  benefit  of  the  HTS  algorithm  comes  from  choosing  a  database  partition  such  that 
Protocol  H  is  used  much  more  than  Protocol  L. 

(3)  Protocol  L  uses  timestamps  which  are  different  from  the  timestamps  of  the  accessing  tran- 
sactions. Since  L\'C- j(I{t))'>I(t).  Protocol  L  is  the  most  "expensive"  among  the  three,  as  it 
tends  to  increase  the  chance  for  transactions  in  lower  data  partitions  to  abort.  In  fact,  the 
larger  is  9,,  the  better  off  are  transactions  in  D,  when  they  access  lower  data  partitions,  as 
there  is  less  probability  for  them  to  abort;  and  the  worse  off  are  the  transactions  belonging 
to  lower  data  partitions.  Offering  tradeoffs  is  intrinsic  to  hierarchical  timestamping.  System 
designers  may  reflect  their  perceived  priorities  among  all  transactions  by  their  choice  of  such 
parameters  as  q-. 

3.    Proof  of  Correctness 

3.1.    Proof  Structure 

To  show  that  the  above  algorithm  is  correct,  one  must  show  that  serializability  is  enforced. 
We  first  formulate  the  problem  of  testing  serializability,  and  then  apply  it  to  the  HTS  algorithm. 

Definition.  A  schedule  is  a  sequence  of  steps,  each  of  which  is  in  the  form  of  a  tuple  <tran- 
saction  id,  action,  version  of  a  data  granule >.  The  action  can  be  read  (r)  or  write  (w).  The  ver- 
sion of  a  data  granule  is  denoted  as  d  ,  where  d  indicates  the  data  granule  and  v  indicates  the 
version.  If  the  action  is  write,  then  the  version  of  the  data  granule  included  in  the  step  is  created 
bv   the   transaction.     If  the   action   is   read,  then   the   transaction   reads  the  version  of  the  data 
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granule  indicated  in  the  tuple.    An  example  of  a  schedule  is  <<,,w,(/'>,  <t2,T,d'>,  <t2,v/,d^  >, 
<t^,T,d'>. 

Definition,  .\ssume  that  a  version  order,  denoted  as  <<,  is  given.  A  version  j  of  a  data 
element  d  is  the  predecessor  of  a  version  k  of  d  if  j  <  <  k  and  there  exists  no  version  i  of  d  such 
that  j  <<  i  <<  k. 

Definition.  A  transaction  dependency  graph  of  a  schedule  S  is  a  directed  graph,  denoted  as 
TG(S),  where  the  nodes  are  the  transactions  in  S  and  the  arcs,  representing  direct  dependencies 
between  transactions,  exist  according  to  the  following  rule: 

r„— /;  is  in  TG(S)  iff 

(1)  <(,,w,(f^  >  and  <  t^.r.d'  >  are  in  S  for  some  d' ,  or 

(2)  <t^j.d'  >  and   <<2,w.(f    >  are  in  S  for  some  d' ,  d    where  d'  \s  the  predecessor  of  d  .  {d' 
is  a  predecessor  oi  d    if  j  <  k  and  there  exists  no  d    such  that  j  <  o  <  k.) 

The  following  theorem  b  adapted  from  the  1-Serializability  Theorem  proven  in  [Bern- 
stein83]: 

Theorem   Given  a  version  order  <  <,  a  schedule  S  is  seriahzable  if  TG(S)  is  acyclic. 

Given  the  above  rule  for  testing  for  serializability,  the  following  steps  are  devised  to  prove 
correctness  of  the  HTS  algorithm: 

(1)  Define  a  relation  topologically  follows,  denoted  as  =>,  between  any  pair  of  transactions  that 
are  assigned  to  related  data  partitions  in  DPH. 

(2)  Show  that  direct  dependencies  may  occur  only  between  transactions  that  are  assigned  to 
partitions  that  are  related  in  DPH. 

(3)  Show  that  if  a  schedule  S  enforces  the  relation  =>  on  all  direct  transaction  dependencies 
(ie,,  tr,—*t^(TG{S)  only  if  ';=>',•)  then  the  transaction  dependency  graph  TG(S)  has  no 
cvcle. 
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(4)      Show  that  the  HTS  algorithm  produces  only  schedules  that  enforce  the  relation  =>  on  all 
direct  transaction  dependencies. 

The  proof  structure  is  important  in  understanding  how  the  implementation  detaik  of  the 
HTS  protocols  may  be  modified  without  having  to  reconstruct  proofs  from  scratch.  In  particular, 
much  of  the  proof  is  accomplished  by  referencing  solely  the  definitions  of  the  functions  DEC  and 
INC,  not  their  implementations.  The  only  exception  is  m  the  last  step  (i.e.,  step  (4)),  a  key  step 
of  the  proof  in  this  paper.  Only  a  small  portion  of  its  proof  directly  cites  the  implementation  of 
the  protocol.  This  portion  will  be  clearly  identified. 

3.2.    The  topologically-follows  Relation  and  its  Properties 

We  now  define  topologically  follows,  which  assumes  a  given  data  partition  hierarchy. 

Definition.  A  relation  topologically  follows,  denoted  as  =>,  is  defined  for  a  pair  of  transac- 
tions t^,  tr,  where  t^^cD-,  tntD^.  /?,  and  D  are  related  in  the  chosen  data  partition  hierarchy.  We 
say  that  t^  topologically  follows  In  (or  <j=>io)  iff 

(1)  if  Z>,.  =  Z)^.  then  /(?,)>/(/o), 

(2)  if  D,>D^  then  !(t^)>DEC^,(I(tn)). 

(3)  if  i?,.<Z)^.  then  /(<„)<Z)£'C,^(/(^,)) 

Clearly,  =  >  is  defined  only  between  transactions  that  belong  to  data  partitions  that  are 
related  in  DPH,  because  otherwise  the  DEC  function  is  undefined 

Lemma.  Given  a  DPH  and  let  t^eO,  and  tncD^.  If  <2~*'p  ^h^"  ^,  ^"'^  ^j  ^'"^  related  in 
DPH. 

Proof.  Let  d(Di^  be  the  contended  data  element  that  causes  '2"^']  Since  at  least  one  tran- 
saction writes  d.  without  loss  of  generality,  let's  assume  it  to  be  *,  that  writes  d.  Then  by  the 
definition  of  data  partition  graph  ba.'^.ed  on  transaction  analysis,  either  D^—*D^iDSG  or  Dj^^D-. 
Since  tr,  must  at  least  read  d.  therefore  either  D^—* Df^cDSG .  or  Dj=Df..  This  means  that 
£),— ►(  =  )D^— ►(  =  )Z),£Z)56'.    This  means  tiiat  if  D^v=^l\.  then  there  exists  a  directed  path  between 
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D-  and  D,  in  DSG .    By  the  derivation  of  the  data  partition  hierarchy,  there  must  be  a  directed 
path  between  D^  and  D-  in  DP//,  which  means  that  D^  and  Z>,  are  related  in  DPH.  Q.E.D. 

In  [Hsu86]  the  following  theorem  concerning  the  partial  ordering  effect  of  the  relation  => 
was  proven: 

Theorem  1.  Let  TG(S)  be  a  transaction  dependency  graph  of  a  set  of  update  transactions 
run  on  a  database.  Let  S  enforce  the  synchronization  rule  that,  given  a  DPH,  (j^^'i^^^l-S^)  only 
if  t2=>ty.  Then  TG(S)  has  no  cycles. 

In  fact,  a  statement  stronger  than  the  above  was  proven  in  [Hsu86].  What  was  proven  was 
that  if  S  enforces  the  rule  that  <o— ►^i«TG(5)  only  if  (2~^TS  aL^i  ^^^"  TG(S)  has  no  cycle,  where 
~ -^ T"^  AL  ^  defined  to  be  topologtcally  follows  with  respect  to  two  functions  TS  and  .4L,  ,  with 
the  requirements  that  TS  maps  each  transaction  to  a  unique  time  value,  and  AL-  satisfies  the 
following  properties: 

(1)  ComposabiUty:  AL.  (AL,^(m  ))=.4L,j(r7i )  for  all  times  m  and  for  all  Z), ,  Z?^  and  D-  where  D- 
>D,>  D,. 

(2)  Non-decreasing:   AL,j{in)>AL^^{m')  for  all  Z),    and   D^    where   D^    >    Z),    and   for  all  times 
m  ,  m'  where  m  >m' . 

In  essence,  the  relation  =>  defined  here  is  an  instance  of  =>ts  al  ^'here  the  function  TS 
has  been  assigned  the  initiation  timestamping  /  of  transactions,  and  .4L,y  has  been  assigned 
DEC^  ■.    It  is  easy  to  verify  that  /  and  DEC\     satisfy  the  requirements  for  TS  and  AL . 

3.3.    HTS  Enforces  topologically-follows 

Given  the  above  theorems  stating  that  the  relation  =>  can  be  used  as  a  vehicle  for  order- 
ing transactions  for  concurrency  control  purposes,  the  following  theorem  completes  our  proof  of 
correctness. 

Theorem  2.  Let  S  be  a  schedule  thai  is  permitted  by  the  hierarchical  timestamp  protocol. 
Then  t^-^t^iTG{S)  only  if  t^=>ty 
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Ptooj.    Let  '2*^;  ^"'^  'i^A-    '2"^']  '*  translated  into  the  following  three  cases: 

(1)  <o  reads  a  data  element  itD^^  written  (created)  by  l^ 

(2)  fj  writes  (i.e.,  creates  a  new  version  of)  a  data  element  diD^^  whose  predecessor  was  read  by 
-   h- 

(3)  ^2  writes  (creates  a  new  version  of)  a  data  element   diD^^  whose  predecessor  was  written 
(created)  by  i^  and  the  version  created  by  t^  is  read  by  another  transaction. 

What  has  to  be  shown  is  that  in  all  three  cases  i^^X-x  holds.    In  all  cases,  the  following  two 
lemmas  which  bind  the  functions  DEC  and  AVC  together  are  used  to  transform  the  timing  rela- 
tionship imposed  by  the  protocol  to  that  defined  by  the  relation  =>: 
Ltmma  1.  DEC,.^(INC^,(m])<m.  for  all  D^>D,  in  DPH . 
Lemma  2.  DEC-  ,(INC,  ^[m)+i)>m ,  for  all  Z)^  >D,  in  DPH,  for  all  c>0. 

We  will  first  prove  these  two  lemmas. 

Lemma  1. 

Proof.  We  first  prove  that  /,(C',(m))<m  for  all  i,r7i.  If  there  exists  no  transaction  active  at 
time  m  in  D,  then  C^{m)=m .  and  I^{C^{m))=I^{m)=m .  If  there  exists  at  least  one  transaction 
active  at  time  m  in  D, ,  let  the  transaction  with  the  largest  commit  timestamp  among  all  that  are 
active  at  time  m  be  t^,  then  C^{m)=C(t^).  Therefore,  at  time  C^{m),  there  is  at  least  one  tran- 
saction active  (since  at  least  t^  is  active),  and  the  oldest  active  transaction  at  that  time  must  be 
at  least  as  old  as  t„.  where  t„  was  active  at  time  m,  which  means  I(t^)<m.  Therefore 
/,(C',(m))</(<^)<m .  Therefore  we  conclude  that  I,(C^(m))<.m  for  all  1,771.  Let  the  critical  path 
between  D  and  D  be  D  —*D,—-D^—>'  ■  ■  ■  —►/)„—►/),.  Then  bv  expanding  according  to  the 
definition  of  DEC  and  L\C .  DEC,JL\r),{m))=  I,{IJ-\IJI„{C,,{C,,{...{C,^{C^[m)))...))))..]). 
Let  C,2(...(C',„(C;(m)))...)=m,2,  then  we  have  lJCJm,„))<m,.^.  Therefore  DEC,^(LWC-,(m))< 
^;(AJ-(A2('".2))  =  -^;(A„(-(A2(^'.2(-(^',n(<^;("' )))•■)•  Continuing  with  the  same  reasoning,  we 
have  DEC,.(L\'C^^(m  ))<I^[C^(m  ))<m  .  Q.E.D. 
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Lemma  2. 

Proof.   We  first  prove  that  /,(C,(»n ))+«)>»"  for  all  i,m.   There  are  two  cases. 

(a)  If  no  transaction  Ls  active  at  time  m  in  Z),  then  C;{m)=m.  Therefore  at  time  C-{m)+i  the 
oldest  active  transaction,  if  any,  cannot  have  started  on  or  before  m.    Therefore  /,(C,(m)+£)>m . 

(b)  If  some  transaction  is  active  at  time  m  in  D-,  then  let  t^  be  defined  the  same  way  as  was  in 
the  proof  of  Lemma  1,  we  have  C^(m)=C{t^).  Therefore  at  time  C(t^)+i,  no  transaction  active 
at  time  m  is  still  active;  That  is,  /,(C(/^)+f)>m ,  or  /,(C,.(m)+<)>m.  Combining  (a)  and  (b)  one 
concludes     that     I-(C-(m))>m      for     all     «,m.       Let     the     path     between     Z),      and     D^     be 

Z),-D.,-D..-Z),3^ An-^r  Let  C,,{...{C,,(C.(m)))...)=m„  and  let 

C,-2(Ct3..,(C,„(C(m )))... )=rM,2.  Thai  is.  "1,2=^,2(^1,3).  By  the  above  conclusion,  we  have 
IJC„(C,.(m,,))+e)>C,^{m,,).  That  is,  I„(C,^(C,^{m,,))+i)>C,2{m,^)+i' .  Therefore,  applying  /.^ 
to  both  sides,  /,o(/,i(Qi('-'.2('".3))+0)^  A2(*''.2('".3)+^')  >'".3-  Continuing  with  the  same  reason- 
ing one  derives  DEC,.(L\C^,(m]+i)=  I,(IJ.-(I„{C„(...(C,,{C.{m))...)+i)..))  >m.  Q.E.D.. 

Now  we  are  ready  to  complete  the  proof  for  Theorem  2.    For  the  three  cases  identified  pre- 
viously: 

(1)  In  this  case.  D^>D)..  Since  D  and  Z),  must  be  related,  we  have  three  cases  and  for  each 
case  observing  the  hierarchical  timestamp  protocol  leads  to  the  conclusion  that  <2=>'i-  (a) 
D->Di>D^.  In  this  case,  obeying  HTS  leads  to  the  assertion:  ISC^j,(I{t^))>INCi,^[I{t^)) 
(0.1).  Therefore  ISCji^(I{tn)]>INC^j^{I{l^))+i.  Applying  function  DEC^.^  to  both  sides  we 
obtain  DEC^.[ISC^j,{I(t^)))>DEC\,(INC,^(I[t,))+i)  (0.2).  From  Lemma  2  above  the 
right  hand  side  of  (0.2)  is  greater  than  I{t{).  Applying  function  DEC, j  to  both  sides  we 
have  DEC,/DEC,,{L\'C^,{I{t^m>DEC,^(DEC,,(INC,,(I(t,))+i)  >DEC,JI{ti)).  How- 
ever, the  left-hand  side  of  the  above  expression  =  DEC/^  -{INC^  i,{I(t^))).  which  by  Lemma  1 
is  less  than  or  equal  to  I(t,J.  Therefore  I{t2)>DEC--(I(t^)),  which  means  t.=  >t^.  (b) 
D  >D  >D,.  In  this  ca^e  (O.l)  holds  also.  Apply  DECj,,  to  both  side  of  (0.1)  we  have 
DEC),{Aj(L\C)i^{I{tr,))))>DEC\J!XC,JI{t\))+().    Right  hand  side  is  >/(/,).    Left  hand 
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side  is  <DECj,{I{t^)).  Therefore  DEC^,l{t„))>I(t^),  which  means  t^=>t^.  (c) 
Di>D^>D^.  In  this  case  INC\^{I(t^))<DEC-i,(I{t,)).  i.e.,  INC-^{I{t^))+i<DEC.^(I[t^)). 
Apply  DEC,,^  to  both  sides  we  have  l{t^)<DECj^{I[t^)).  i.e.,  '2=>'r 

(2)  In  this  case,  D  yOi^.  By  the  same  reasoning  in  (1)  we  can  show  t^=>t^  for  the  cases  of 
D->D^>Di^  and  D->Dj>Di..  Now  we  consider  the  cases  where  Dj'>Df.>D-.  Let  the  time 
at  which  f„  performs  writing  rf  be  m .  Consider  two  cases,  (a)  If  (j's  request  to  read  d 
arrives  at  or  before  time  m.  which  means  /[  had  started  at  or  before  m ,  i.e.,  /(<,)<m  and 
P£'C-^(/(<,))<4(m)<m  (0.3).  (a.l)  Dj>D^.  In  this  case,  by  the  Protocol  L 
■'^'-^  *(^('2))^'"  ■  ^°^  Otherwise  the  write  request  would  have  been  rejected  (in 
ProcCj{I(tr,))).  Combine  this  with  (0.3)  we  have  INC.  i,(I{tr,))>DEC-i,(I(t^]).  Apply 
DEC;,  J-  to  both  sides  we  get  I{tr.)>DEC■^^(I(t^)),  i.e.,  to=>ty  (a.2)  D-=Di,.  In  this  case. 
INC-  i^{I(tr,))=I(tf,).  Since  tr,  makes  a  request  at  time  m,  t^  is  active  at  time  m  (and  there- 
fore /|fc(  w  )</('o)),  except  when  ^o  satisfies  conditions  under  which  its  commit  timestamp  is 
to  be  pushed  backwards  to  be  before  m .  However  when  such  conditions  hold  the  ProcIi^(m' ) 
procedure  for  computing  DEC-  |^(I{t^))  for  m'  <m  would  have  to  block  till  t^  finishes,  which 
is  after  m .  contradictor}'  to  the  given  that  ^^'s  request  to  read  d  arrives  before  m .  There- 
fore I^{m)<I(t,).  Therefore  L\Z)i^(I[tr,))>I;^(m)>DEC,^{I{t^)).  By  same  reasoning  in  (a.l) 
we  have  <o=>^,.  (b)  If  <,'s  request  to  read  d  arrives  after  m,  then  the  version  created  by 
<o  already  existed,  and  by  Protocol  H  if  ^j  chooses  to  read  a  version  before  that  created  by 
<2  it  must  be  INC^j^{I{tr.))>DEC,i^(I{t^)).  Therefore  ^2=>^■    (b) 

(3)  In  this  case,  it  suffices  to  prove  that  tr,=  >t^  when  tn  creates  a  new  version  of  d  whose 
predecessor  is  created  by  t^.  Since  in  this  case  Z),>£>^and  D^>D^.  the  two  relevant  cases 
are  already  shown  in  (1)  and  therefore  ^n=>^,.    Q.E.D. 

Proof  of  Theorem  2  is  largely  mdependent  of  the  implementation  procedures  for  ProcC  and 
Prod  except  for  arguments  in  (2)(a).  The  dependency  arises  due  to  our  desire  to  allow  the  H 
protocol  in  HTS   not  to  have  to  leave  read  limef^laiu^r'^    If  HTS  is  modified  such  that  Protocol  H 
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also  required  read  timestamps,  then  the  proof  can  be  constructed  entirely  from  the  definitions  of 
functions  DEC  and  INC  and  their  properties  without  resorting  to  implementations. 

4.    Discussion  of  Performance 

Performance  of  a  concurrency  control  algorithm  can  be  assessed  along  two  dimensions.  One 
is  the  computational  overhead,  such  as  setting  locks  and  timestamping  data  elements.  The  other 
is  the  level  of  concurrency,  or  "optimality"  of  the  algorithm.  Along  both  dimensions,  the  HTS 
algorithm  potentially  performs  better  than  the  NTV'TS  algorithm,  if  a  data  partition  hierarchy  can 
be  found  such  that  the  frequency  of  accesses  from  transactions  assigned  to  lower  partitions  to 
data  elements  contained  in  higher  data  panition?  dominates  that  of  the  reverse  direction.  In 
other  words,  if  the  data  partition  hierarchy  is  chosen  such  that  Protocol  H  (the  "cheap"  protocol) 
is  used  much  more  often  than  Protocol  L  (the  "expensive"  protocol,)  then  a  net  saving  may  be 
achieved  to  warrant  the  use  of  this  algorithm. 

The  HTS  algorithm  requires  a  transaction  analysis  and  the  overhead  of  maintaining  selective 
information  on  transaction  initiation  and  commit  limes  to  enable  computation  of  the  A  function 
and  enforcement  of  the  B  function.  However,  the  fact  that  Protocol  H  does  not  require  times- 
tamping the  data  elements  makes  it  very  attractive  for  situations  where  transactions  assigned  to  a 
lower  partition  need  to  access  a  certain  quantity  of  data  in  a  higher  data  partitions.  Therefore 
the  HTS  algorithm  can  potentially  produce  a  net  savmg  in  computational  overhead  if  Protocol  H 
is  used  "often  enough." 

Now  we  analyze  the  optimality  aspect  of  the  algorithm.  In  [Kung79j,  optimality  of  a  con- 
currency control  algorithm  is  defined  as  the  degree  to  which  the  algorithm  allows  for  "serializable 
input  schedules"  to  proceed  without  being  mterrupted  Intuitively,  an  input  schedule,  which  is  an 
interleaved  sequence  of  read  and  write  requests  issued  by  a  set  of  transactions,  is  mulli-version- 
serializable  (NrV'SR)  if  there  exists  a  way  of  assigning  versions  to  these  read  and  write  requests 
such  that  the  resulting  transaction  dip('n(i''n(>   graph  i-  acyclic     For  reasons  of  implementability, 
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no  known  multi-version  concurrency  control  algorithm  will  allow  all  N-fV^SR  input  schedules  to 
proceed  without  interruption.  However,  the  degree  to  which  an  algorithm  allows  MVSR  schedules 
to  proceed  without  interruption  can  be  used  as  a  useful  measure  of  level  of  concurrency. 

In  comparing  the  HTS  algorithm  to  the  MVTS  algorithm,  it  can  be  argued  that  if  Protocol 
H  is  used  much  more  frequently  than  Protocol  L,  then  the  expected  number  of  restarts  (i.e.,  tran- 
saction aborts)  under  the  HTS  algorithm  would  be  smaller  than  that  under  the  MVTS  algorithm. 
While  further  studies  are  needed  to  quantify  this  statement,  the  following  theorem  confirms  that, 
in  the  extreme  case  when  Protocol  L  is  not  used  at  all,  the  HTS  algorithm  dominates  the  NfV'TS 
algorithm  in  terms  of  optimality. 

Lemma  S.  If  the  data  partition  graph  DPG{P ,T^)  of  some  database  decomposition  P  is 
acyclic,  then  among  the  set  of  correct  data  partition  hierarchies  DPH{P ,T^).  there  exists  DPH 
such  that  Protocol  L  is  not  needed  to  process  update  transactions  in  T, .  (This  can  be  shown  by 
assigning  to  DPH  the  partial  order  of  data  partitions  exhibited  in  DPG{P ,T^),  and  since  no  path 
exists  in  DPG{P ,T^)  from  a  higher  data  partition  to  a  lower  data  partition,  there  exists  no 
accesses  from  transactions  assigned  to  higher  data  partitions  to  the  lower  data  partitions,  and 
therefore  Protocol  L  is  never  needed.) 

Theorem  3.  Let  F  be  a  database  decomposition  and  DPG{P ,T,]  be  acyclic.  Let  the  data 
partition  hierarchy  DPFI  be  such  that  Protocol  L  is  never  used  in  processing  transactions  in  T, . 
Let  'S(.\f\'TS)  denote  the  set  of  serializable  input  schedules  that  are  allowed  for  execution  by 
NPk'TS  without  interruption,  and  S(HTS)  that  by  the  HTS  algorithm  under  the  data  partition 
hierarchy  DPH  where  both  Protocol  H  and  Protocol  E  are  used.  Then  S{\{\'TS)  is  a  subset  of 
S(HTS). 

Proof  We  must  show  that  (1)  any  schedule  allowed  by  NfXTS  must  be  alloHt-d  by  HTS, 
and  (2)  there  exists  at  least  one  schedule  allowed  by  HTS  that  is  not  allowed  by  \r\TS. 

(1)      Let   a  schedule   ScS{M\'TS).     Want    to  show   that    S(S(HTS).     Suppose   this   is  not   true. 
Then  there  must  exist  a  step  {l^,w,d)  in  6'  that  HTS  will  abort.    We  will  show  that  this  is 
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impossible.  Lei  t^(D^.  Let  (t^,r ,d)<{t,,w.d)  denote  that  fact  that  step  {t  .r,d)  is  before 
(tf,w,d)  in  S.  Since  Protocol  L  is  never  used,  the  timestamp  used  to  synchronize  (t^,w.d)  by 
HTS  must  be  /(<,).  Therefore  it  suffices  to  show  that  there  exists  no  (tj.r,d)<(t-,w,d)  in  5 
that  will  cause  the  HTS  algorithm  to  leave  a  read  timestamp  greater  than  I(ti)  with  d. 
This  is  true  because  for  all  {t^,r,d)  where  t^eD^,  either  0^=0^  or  D^>D^.  In  the  latter 
case,  no  read  timestamp  is  left  by  HTS.  In  the  former  case,  the  read  timestamp  left  by  HTS 
is  equal  to  that  left  by  M\''TS,  which  must  be  smaller  than  /(<,). 

(2)  Consider  the  following  schedule:  S={t ^,r ,a),{tn.r .a),{t„,w ,a),{t„w ,,h),  where  <,eZ?,,  afD,, 
t^iDj.  biDj.  D,>D^  m  DPH.  and  I{t^]<I[t^).  Clearly  SiS(HTS)  and  S  is  not  a  member 
of  S{.\fVTS).    Q.E.D. 

The  above  result  implies  that,  given  a  database  application,  if  the  set  of  update  transactions 
r,  can  be  designed  in  such  a  way  that  a  non-trivial  database  decomposition  P  can  be  found  such 
that  the  data  partition  graph  DPG(P ,T^)  is  acyclic,  then  the  HTS  algorithm  will  definitely  per- 
form better  than  the  M\'TS  algorithm  in  terms  of  level  of  concurrency.  When  such  design  cannot 
be  achieved,  but  a  design  is  found  that  enables  construction  of  a  data  partition  hierarchy  where 
usage  of  Protocol  H  is  much  higher  than  that  of  Protocol  L,  the  HTS  algorithm  can  still  be 
expected  to  perform  better  than  the  NATS  algorithm. 

In  [Hsu83b,,  a  simple  analytical  model  was  established  for  an  application  of  the  HTS  algo- 
rithm to  a  simple  Iwo-data-partition  database.  In  the  analysis,  it  Ls  assumed  that  the  higher  data 
partition  b  more  heavily  contended  for  by  transactions  than  the  lower  data  partition.  The  simpli- 
city of  the  case  enables  the  use  of  simple  analysis  to  obtain  formuli  for  the  rate  of  blocking  under 
two-phase  locking  (2PL)  due  to  conflict  in  the  higher  data  partition,    algorithm  and  the  rate  of 

A 

abort  under  the  HTS  algorithm  The  analysis  makes  assumptions  that  are  in  general  in  favor  of 
the  2PL  approach  and  discriminate^  against  the  HTS  approach.  The  purpose  is  to  demonstrate 
the  effect  of  the  HTS  algorithm  in  relieving  contention  in  the  higher  data  partition,  presumably 
the  much  more  heavily  contended  data  partition  in  our  case.    The  result,  reported  in  [Hsu83b], 
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shows  that,  in  general,  the  rate  of  blocking  under  2PL  is  proportional  to  a ^*,\fP^  +ar,*MP^*\fPr,. 
where  A/Fj  is  the  number  of  type-1  transactions  in  the  system  at  any  time  and  type-1  transac- 
tions are  those  assigned  to  the  higher  data  partition,  A/P2  that  of  type-2  transactions  and  type-2 
transactions  are  those  assigned  to  the  lower  data  partition,  and  at  is  the  number  of  data  elements 
in  D,  accessed  by  a  typical  transaction  assigned  to  D, ,  where  i'=l,2.  In  contrast,  the  rate  of 
abort  under  HTS  is  proportional  only  to  a,  A/P,'.  While  the  absolute  values  depend  on  other 
parameters,  one  can  conclude  that  HTS  would  be  preferred  when  Oj'A/Pj  is  large. 

5.    Conclusion 

The  Hierarchical  Timestamp  Algorithm  can  be  considered  as  a  generalized  timestamp  algo- 
rithm parameterized  by  a  data  partition  hierarchy  constructed  via  transaction  analysis.  It  relies 
on  an  understanding  of  the  structural  disciplines  of  an  application  system  and  represents  an 
attempt  to  take  advantage  of  these  disciplines.  By  being  sensitive  to  the  existence  of  such  struc- 
tures and  being  flexible  in  manipulating  the  basic  control  tools  (e.g.,  timestamps,  locks)  this  struc- 
tural approach  to  concurrency  control  holds  promise  for  improving  applications  or  activities  in 
databases  where  the  level  of  concurrency  is  vital  to  system  performance. 

The  thrust  of  this  paper  thus  goes  beyond  proposing  a  new  concurrency  control  algorithm. 
It  demonstrates  the  potential  benefit  of  exploiting  the  knowledge  and  the  structure  of  the  applica- 
tion systems  to  implement  more  efficient  and  more  tailored  concurrency  control  mechanism.  It 
also  points  out  an  area  of  research  which  examines  the  problem  of  how  to  design  transactions  so 
that  concurrency  control  can  be  more  efficient  without  compromising  the  key  requirements  on  the 
data.  It  is  believed  that  transaction  design  performed  with  concurrency  control  problems  in  mind 
could  produce  a  structure  of  applications  that  is  significantly  less  costly  for  concurrency  control  to 
be  implemented. 
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